{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Math23k Analysis Report"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Description\n",
    "| Field             | Annotation                                          |\n",
    "| --------          | --------------------------------------------------- |\n",
    "| id                | Id of the problem |\n",
    "| original_text\t    | Original text of the problem |\n",
    "| equation          | Solution to the problem |\n",
    "| segmented_text    | Chinese word segmentation of the problem |\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import jieba\n",
    "\n",
    "import plotly.express as px\n",
    "from plotly.subplots import make_subplots\n",
    "import plotly.graph_objs as go"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "path1 = \"./math23k/raw/train23k.json\"\n",
    "path2 = \"./math23k/raw/test23k.json\"\n",
    "path3 = \"./math23k/raw/valid23k.json\"\n",
    "\n",
    "data  = pd.read_json(path1, orient='records')\n",
    "data2 = pd.read_json(path2, orient='records')\n",
    "data3 = pd.read_json(path3, orient='records')\n",
    "data = pd.concat([data, data2, data3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Record Examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>original_text</th>\n",
       "      <th>equation</th>\n",
       "      <th>segmented_text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>946</td>\n",
       "      <td>甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．</td>\n",
       "      <td>x=20/(4-1.5)*1.5</td>\n",
       "      <td>甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21227</td>\n",
       "      <td>客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...</td>\n",
       "      <td>x=196/(80%+((3)/(3+2))-1)</td>\n",
       "      <td>客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16892</td>\n",
       "      <td>图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？</td>\n",
       "      <td>x=30*(1-(1/5))+5</td>\n",
       "      <td>图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8502</td>\n",
       "      <td>甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...</td>\n",
       "      <td>x=(230-35)/3-48</td>\n",
       "      <td>甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23021</td>\n",
       "      <td>果园里有苹果树300棵，比桔树多20%，桔树有多少棵？</td>\n",
       "      <td>x=300/(1+20%)</td>\n",
       "      <td>果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id                                      original_text  \\\n",
       "0    946              甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．   \n",
       "1  21227  客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...   \n",
       "2  16892          图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？   \n",
       "3   8502  甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...   \n",
       "4  23021                        果园里有苹果树300棵，比桔树多20%，桔树有多少棵？   \n",
       "\n",
       "                    equation  \\\n",
       "0           x=20/(4-1.5)*1.5   \n",
       "1  x=196/(80%+((3)/(3+2))-1)   \n",
       "2           x=30*(1-(1/5))+5   \n",
       "3            x=(230-35)/3-48   \n",
       "4              x=300/(1+20%)   \n",
       "\n",
       "                                      segmented_text  \n",
       "0  甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...  \n",
       "1  客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...  \n",
       "2  图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...  \n",
       "3  甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...  \n",
       "4        果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The number of problems"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "23162"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(data['id'].unique())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part of missing values for every column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "id                0.0\n",
       "original_text     0.0\n",
       "equation          0.0\n",
       "segmented_text    0.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.isnull().sum() / len(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cut words and find verbs in problems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verbs may be quite useful for solving math word problems, because sometimes a verb means an operator in equation. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Building prefix dict from the default dictionary ...\n",
      "Loading model from cache /tmp/jieba.cache\n",
      "Loading model cost 0.563 seconds.\n",
      "Prefix dict has been built successfully.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>original_text</th>\n",
       "      <th>equation</th>\n",
       "      <th>segmented_text</th>\n",
       "      <th>content</th>\n",
       "      <th>verbs</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>946</td>\n",
       "      <td>甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．</td>\n",
       "      <td>x=20/(4-1.5)*1.5</td>\n",
       "      <td>甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...</td>\n",
       "      <td>[甲数, 除以, 乙数, 的, 商是, 1.5, ，, 如果, 甲数, 增加, 20, ，,...</td>\n",
       "      <td>[增加, 是]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21227</td>\n",
       "      <td>客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...</td>\n",
       "      <td>x=196/(80%+((3)/(3+2))-1)</td>\n",
       "      <td>客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...</td>\n",
       "      <td>[客车, 和, 货车, 分别, 从, A, 、, B, 两站, 同时, 相向, 开出, ，,...</td>\n",
       "      <td>[相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16892</td>\n",
       "      <td>图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？</td>\n",
       "      <td>x=30*(1-(1/5))+5</td>\n",
       "      <td>图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...</td>\n",
       "      <td>[图书, 角有, 书, 30, 本, ，, 第一天, 借出, 了, (, 1, /, 5, ...</td>\n",
       "      <td>[借出, 回, 有]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8502</td>\n",
       "      <td>甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...</td>\n",
       "      <td>x=(230-35)/3-48</td>\n",
       "      <td>甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...</td>\n",
       "      <td>[甲, 、, 乙, 两车, 同时, 从, 相距, 230, 千米, 的, 两地, 相向, 而...</td>\n",
       "      <td>[相距, 相向, 相距, 已知]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23021</td>\n",
       "      <td>果园里有苹果树300棵，比桔树多20%，桔树有多少棵？</td>\n",
       "      <td>x=300/(1+20%)</td>\n",
       "      <td>果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？</td>\n",
       "      <td>[果园, 里, 有, 苹果树, 300, 棵, ，, 比, 桔树, 多, 20%, ，, 桔...</td>\n",
       "      <td>[有, 有]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id                                      original_text  \\\n",
       "0    946              甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．   \n",
       "1  21227  客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...   \n",
       "2  16892          图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？   \n",
       "3   8502  甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...   \n",
       "4  23021                        果园里有苹果树300棵，比桔树多20%，桔树有多少棵？   \n",
       "\n",
       "                    equation  \\\n",
       "0           x=20/(4-1.5)*1.5   \n",
       "1  x=196/(80%+((3)/(3+2))-1)   \n",
       "2           x=30*(1-(1/5))+5   \n",
       "3            x=(230-35)/3-48   \n",
       "4              x=300/(1+20%)   \n",
       "\n",
       "                                      segmented_text  \\\n",
       "0  甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...   \n",
       "1  客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...   \n",
       "2  图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...   \n",
       "3  甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...   \n",
       "4        果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？   \n",
       "\n",
       "                                             content  \\\n",
       "0  [甲数, 除以, 乙数, 的, 商是, 1.5, ，, 如果, 甲数, 增加, 20, ，,...   \n",
       "1  [客车, 和, 货车, 分别, 从, A, 、, B, 两站, 同时, 相向, 开出, ，,...   \n",
       "2  [图书, 角有, 书, 30, 本, ，, 第一天, 借出, 了, (, 1, /, 5, ...   \n",
       "3  [甲, 、, 乙, 两车, 同时, 从, 相距, 230, 千米, 的, 两地, 相向, 而...   \n",
       "4  [果园, 里, 有, 苹果树, 300, 棵, ，, 比, 桔树, 多, 20%, ，, 桔...   \n",
       "\n",
       "                                    verbs  \n",
       "0                                 [增加, 是]  \n",
       "1  [相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]  \n",
       "2                              [借出, 回, 有]  \n",
       "3                        [相距, 相向, 相距, 已知]  \n",
       "4                                  [有, 有]  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import jieba.posseg as pseg\n",
    "def cut_word(text):\n",
    "    return jieba.lcut(text)\n",
    "\n",
    "def find_verbs(text):\n",
    "    words = pseg.cut(text)\n",
    "    return [word for word,flag in words if flag == 'v']\n",
    "\n",
    "data['content']=data['original_text'].apply(cut_word)\n",
    "data['verbs']=data['original_text'].apply(find_verbs)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Count of words of problems"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>original_text</th>\n",
       "      <th>equation</th>\n",
       "      <th>segmented_text</th>\n",
       "      <th>content</th>\n",
       "      <th>verbs</th>\n",
       "      <th>word_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>946</td>\n",
       "      <td>甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．</td>\n",
       "      <td>x=20/(4-1.5)*1.5</td>\n",
       "      <td>甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...</td>\n",
       "      <td>[甲数, 除以, 乙数, 的, 商是, 1.5, ，, 如果, 甲数, 增加, 20, ，,...</td>\n",
       "      <td>[增加, 是]</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21227</td>\n",
       "      <td>客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...</td>\n",
       "      <td>x=196/(80%+((3)/(3+2))-1)</td>\n",
       "      <td>客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...</td>\n",
       "      <td>[客车, 和, 货车, 分别, 从, A, 、, B, 两站, 同时, 相向, 开出, ，,...</td>\n",
       "      <td>[相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16892</td>\n",
       "      <td>图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？</td>\n",
       "      <td>x=30*(1-(1/5))+5</td>\n",
       "      <td>图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...</td>\n",
       "      <td>[图书, 角有, 书, 30, 本, ，, 第一天, 借出, 了, (, 1, /, 5, ...</td>\n",
       "      <td>[借出, 回, 有]</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8502</td>\n",
       "      <td>甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...</td>\n",
       "      <td>x=(230-35)/3-48</td>\n",
       "      <td>甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...</td>\n",
       "      <td>[甲, 、, 乙, 两车, 同时, 从, 相距, 230, 千米, 的, 两地, 相向, 而...</td>\n",
       "      <td>[相距, 相向, 相距, 已知]</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23021</td>\n",
       "      <td>果园里有苹果树300棵，比桔树多20%，桔树有多少棵？</td>\n",
       "      <td>x=300/(1+20%)</td>\n",
       "      <td>果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？</td>\n",
       "      <td>[果园, 里, 有, 苹果树, 300, 棵, ，, 比, 桔树, 多, 20%, ，, 桔...</td>\n",
       "      <td>[有, 有]</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id                                      original_text  \\\n",
       "0    946              甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．   \n",
       "1  21227  客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...   \n",
       "2  16892          图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？   \n",
       "3   8502  甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...   \n",
       "4  23021                        果园里有苹果树300棵，比桔树多20%，桔树有多少棵？   \n",
       "\n",
       "                    equation  \\\n",
       "0           x=20/(4-1.5)*1.5   \n",
       "1  x=196/(80%+((3)/(3+2))-1)   \n",
       "2           x=30*(1-(1/5))+5   \n",
       "3            x=(230-35)/3-48   \n",
       "4              x=300/(1+20%)   \n",
       "\n",
       "                                      segmented_text  \\\n",
       "0  甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...   \n",
       "1  客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...   \n",
       "2  图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...   \n",
       "3  甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...   \n",
       "4        果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？   \n",
       "\n",
       "                                             content  \\\n",
       "0  [甲数, 除以, 乙数, 的, 商是, 1.5, ，, 如果, 甲数, 增加, 20, ，,...   \n",
       "1  [客车, 和, 货车, 分别, 从, A, 、, B, 两站, 同时, 相向, 开出, ，,...   \n",
       "2  [图书, 角有, 书, 30, 本, ，, 第一天, 借出, 了, (, 1, /, 5, ...   \n",
       "3  [甲, 、, 乙, 两车, 同时, 从, 相距, 230, 千米, 的, 两地, 相向, 而...   \n",
       "4  [果园, 里, 有, 苹果树, 300, 棵, ，, 比, 桔树, 多, 20%, ，, 桔...   \n",
       "\n",
       "                                    verbs  word_count  \n",
       "0                                 [增加, 是]          24  \n",
       "1  [相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]          63  \n",
       "2                              [借出, 回, 有]          28  \n",
       "3                        [相距, 相向, 相距, 已知]          38  \n",
       "4                                  [有, 有]          17  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def getsize(ser):\n",
    "    return len(ser)\n",
    "\n",
    "data['word_count']=data['content'].apply(getsize)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The length of problems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This picture shows that the length of most problems are about 20 to 40 chinese words.It may be helpful for design of model's input "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"500\" style=\"\" viewBox=\"0 0 700 500\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"500\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-ac3c42\"><g class=\"clips\"><clipPath id=\"clipac3c42xyplot\" class=\"plotclip\"><rect width=\"540\" height=\"320\"/></clipPath><clipPath class=\"axesclip\" id=\"clipac3c42x\"><rect x=\"80\" y=\"0\" width=\"540\" height=\"500\"/></clipPath><clipPath class=\"axesclip\" id=\"clipac3c42y\"><rect x=\"0\" y=\"100\" width=\"700\" height=\"320\"/></clipPath><clipPath class=\"axesclip\" id=\"clipac3c42xy\"><rect x=\"80\" y=\"100\" width=\"540\" height=\"320\"/></clipPath></g><g class=\"gradients\"/><g class=\"patterns\"/></defs><g class=\"bglayer\"><rect class=\"bg\" x=\"80\" y=\"100\" width=\"540\" height=\"320\" style=\"fill: rgb(229, 236, 246); fill-opacity: 1; stroke-width: 0;\"/></g><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"><g class=\"subplot xy\"><g class=\"layer-subplot\"><g class=\"shapelayer\"/><g class=\"imagelayer\"/></g><g class=\"gridlayer\"><g class=\"x\"/><g class=\"y\"><path class=\"ygrid crisp\" transform=\"translate(0,364.83)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"ygrid crisp\" transform=\"translate(0,309.65999999999997)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"ygrid crisp\" transform=\"translate(0,254.48)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"ygrid crisp\" transform=\"translate(0,199.31)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"ygrid crisp\" transform=\"translate(0,144.14)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/></g></g><g class=\"zerolinelayer\"><path class=\"yzl zl crisp\" transform=\"translate(0,420)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 2px;\"/></g><path class=\"xlines-below\"/><path class=\"ylines-below\"/><g class=\"overlines-below\"/><g class=\"xaxislayer-below\"/><g class=\"yaxislayer-below\"/><g class=\"overaxes-below\"/><g class=\"plot\" transform=\"translate(80,100)\" clip-path=\"url(#clipac3c42xyplot)\"><g class=\"barlayer mlayer\"><g class=\"trace bars\" style=\"opacity: 1;\"><g class=\"points\"><g class=\"point\"><path d=\"M83.85,320V16H87.37V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M88.24,320V27.31H91.76V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M97.02,320V35.03H100.54V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M92.63,320V35.03H96.15V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M105.8,320V38.9H109.32V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M101.41,320V41.1H104.93V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M75.07,320V56.55H78.59V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M79.46,320V60.14H82.98V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M114.59,320V64.55H118.1V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M110.2,320V69.79H113.71V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M123.37,320V94.34H126.88V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M70.68,320V107.31H74.2V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M118.98,320V111.72H122.49V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M66.29,320V112.83H69.8V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M127.76,320V131.86H131.27V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M132.15,320V136.83H135.66V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M140.93,320V153.93H144.44V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M61.9,320V154.76H65.41V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M136.54,320V155.03H140.05V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M57.51,320V178.21H61.02V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M145.32,320V190.07H148.83V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M149.71,320V201.38H153.22V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M53.12,320V204.97H56.63V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M154.1,320V207.72H157.61V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M48.73,320V209.66H52.24V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M158.49,320V225.93H162V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M162.88,320V231.45H166.39V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M44.34,320V236.97H47.85V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M31.17,320V244.41H34.68V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M167.27,320V248.55H170.78V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M171.66,320V249.1H175.17V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M39.95,320V253.24H43.46V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M176.05,320V260.69H179.56V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M35.56,320V262.9H39.07V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M180.44,320V272H183.95V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M184.83,320V272.83H188.34V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M189.22,320V281.66H192.73V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M22.39,320V284.14H25.9V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M26.78,320V286.9H30.29V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M18,320V289.1H21.51V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M193.61,320V292.14H197.12V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M198,320V292.41H201.51V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M13.61,320V299.31H17.12V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M202.39,320V299.31H205.9V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M211.17,320V301.24H214.68V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M206.78,320V301.52H210.29V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M9.22,320V301.79H12.73V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M215.56,320V304.28H219.07V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M228.73,320V307.86H232.24V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M219.95,320V308.14H223.46V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M224.34,320V308.97H227.85V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M233.12,320V311.72H236.63V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M4.83,320V311.72H8.34V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M241.9,320V313.38H245.41V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M237.51,320V313.66H241.02V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M263.85,320V315.59H267.37V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M246.29,320V315.59H249.8V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M268.24,320V315.86H271.76V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M259.46,320V316.14H262.98V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M250.68,320V316.41H254.2V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M255.07,320V316.41H258.59V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M281.41,320V316.69H284.93V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M277.02,320V317.52H280.54V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M298.98,320V317.79H302.49V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M272.63,320V317.79H276.15V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M290.2,320V318.07H293.71V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M285.8,320V318.07H289.32V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M294.59,320V318.62H298.1V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M320.93,320V318.62H324.44V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M312.15,320V318.9H315.66V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M342.88,320V319.17H346.39V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M303.37,320V319.17H306.88V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M316.54,320V319.17H320.05V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M404.34,320V319.17H407.85V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M382.39,320V319.45H385.9V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M325.32,320V319.45H328.83V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M351.66,320V319.45H355.17V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M338.49,320V319.45H342V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M307.76,320V319.45H311.27V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M399.95,320V319.45H403.46V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M334.1,320V319.45H337.61V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M356.05,320V319.72H359.56V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M430.68,320V319.72H434.2V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M329.71,320V319.72H333.22V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M347.27,320V319.72H350.78V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0.44,320V319.72H3.95V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M378,320V319.72H381.51V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M408.73,320V319.72H412.24V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M373.61,320V319.72H377.12V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M536.05,320V319.72H539.56V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M369.22,320V319.72H372.73V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g></g></g></g></g><g class=\"overplot\"/><path class=\"xlines-above crisp\" d=\"M0,0\" style=\"fill: none;\"/><path class=\"ylines-above crisp\" d=\"M0,0\" style=\"fill: none;\"/><g class=\"overlines-above\"/><g class=\"xaxislayer-above\"><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" transform=\"translate(152.44,0)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">20</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(240.24,0)\">40</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(328.05,0)\">60</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(415.85,0)\">80</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(503.66,0)\">100</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(591.46,0)\">120</text></g></g><g class=\"yaxislayer-above\"><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,420)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">0</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,364.83)\">200</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,309.65999999999997)\">400</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,254.48)\">600</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,199.31)\">800</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,144.14)\">1000</text></g></g><g class=\"overaxes-above\"/></g></g><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"/><g class=\"iciclelayer\"/><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-ac3c42\"><g class=\"clips\"/></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">The length of problems</text></g><g class=\"g-xtitle\"><text class=\"xtitle\" x=\"350\" y=\"460.8\" text-anchor=\"middle\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 14px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">word_count</text></g><g class=\"g-ytitle\"><text class=\"ytitle\" transform=\"rotate(-90,27.496875000000003,260)\" x=\"27.496875000000003\" y=\"260\" text-anchor=\"middle\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 14px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">problem_count</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "cnt = data['word_count'].value_counts().reset_index()\n",
    "cnt.columns = [ 'word_count' , 'problem_count']\n",
    "\n",
    "fig = px.bar(\n",
    "    cnt , x = 'word_count', y = 'problem_count' ,\n",
    "    title = 'The length of problems'\n",
    ")\n",
    "fig.show('svg')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Delete stopword"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>original_text</th>\n",
       "      <th>equation</th>\n",
       "      <th>segmented_text</th>\n",
       "      <th>content</th>\n",
       "      <th>verbs</th>\n",
       "      <th>word_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>946</td>\n",
       "      <td>甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．</td>\n",
       "      <td>x=20/(4-1.5)*1.5</td>\n",
       "      <td>甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...</td>\n",
       "      <td>[甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]</td>\n",
       "      <td>[增加, 是]</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21227</td>\n",
       "      <td>客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...</td>\n",
       "      <td>x=196/(80%+((3)/(3+2))-1)</td>\n",
       "      <td>客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...</td>\n",
       "      <td>[客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...</td>\n",
       "      <td>[相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16892</td>\n",
       "      <td>图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？</td>\n",
       "      <td>x=30*(1-(1/5))+5</td>\n",
       "      <td>图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...</td>\n",
       "      <td>[图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]</td>\n",
       "      <td>[借出, 回, 有]</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8502</td>\n",
       "      <td>甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...</td>\n",
       "      <td>x=(230-35)/3-48</td>\n",
       "      <td>甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...</td>\n",
       "      <td>[甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...</td>\n",
       "      <td>[相距, 相向, 相距, 已知]</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23021</td>\n",
       "      <td>果园里有苹果树300棵，比桔树多20%，桔树有多少棵？</td>\n",
       "      <td>x=300/(1+20%)</td>\n",
       "      <td>果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？</td>\n",
       "      <td>[果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]</td>\n",
       "      <td>[有, 有]</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id                                      original_text  \\\n",
       "0    946              甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．   \n",
       "1  21227  客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...   \n",
       "2  16892          图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？   \n",
       "3   8502  甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...   \n",
       "4  23021                        果园里有苹果树300棵，比桔树多20%，桔树有多少棵？   \n",
       "\n",
       "                    equation  \\\n",
       "0           x=20/(4-1.5)*1.5   \n",
       "1  x=196/(80%+((3)/(3+2))-1)   \n",
       "2           x=30*(1-(1/5))+5   \n",
       "3            x=(230-35)/3-48   \n",
       "4              x=300/(1+20%)   \n",
       "\n",
       "                                      segmented_text  \\\n",
       "0  甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...   \n",
       "1  客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...   \n",
       "2  图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...   \n",
       "3  甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...   \n",
       "4        果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？   \n",
       "\n",
       "                                             content  \\\n",
       "0    [甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]   \n",
       "1  [客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...   \n",
       "2       [图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]   \n",
       "3  [甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...   \n",
       "4               [果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]   \n",
       "\n",
       "                                    verbs  word_count  \n",
       "0                                 [增加, 是]          24  \n",
       "1  [相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]          63  \n",
       "2                              [借出, 回, 有]          28  \n",
       "3                        [相距, 相向, 相距, 已知]          38  \n",
       "4                                  [有, 有]          17  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def get_stopword():\n",
    "    s = set()\n",
    "    with open(\"../raw_data/stopword/stopword.txt\",\"r\",encoding=\"UTF-8\") as f:\n",
    "        for line in f:\n",
    "            s.add(line.strip())\n",
    "    return s\n",
    "\n",
    "def delete_stopword(words):\n",
    "    return [w for w in words if (w not in stopword)]\n",
    "\n",
    "stopword=get_stopword()\n",
    "data['content']=data['content'].apply(delete_stopword)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The keywords"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Keywords may show us the topic of problem sometimes.They are useful for our analysis.\n",
    "This report use textrank algorithm in 'jieba'. because length of problem are usually short ,TF/IDF may be not suitable for this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>original_text</th>\n",
       "      <th>equation</th>\n",
       "      <th>segmented_text</th>\n",
       "      <th>content</th>\n",
       "      <th>verbs</th>\n",
       "      <th>word_count</th>\n",
       "      <th>keywords</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>946</td>\n",
       "      <td>甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．</td>\n",
       "      <td>x=20/(4-1.5)*1.5</td>\n",
       "      <td>甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...</td>\n",
       "      <td>[甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]</td>\n",
       "      <td>[增加, 是]</td>\n",
       "      <td>24</td>\n",
       "      <td>[甲数, 商是, 乙数]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21227</td>\n",
       "      <td>客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...</td>\n",
       "      <td>x=196/(80%+((3)/(3+2))-1)</td>\n",
       "      <td>客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...</td>\n",
       "      <td>[客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...</td>\n",
       "      <td>[相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]</td>\n",
       "      <td>63</td>\n",
       "      <td>[路程, 前进, 小时]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16892</td>\n",
       "      <td>图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？</td>\n",
       "      <td>x=30*(1-(1/5))+5</td>\n",
       "      <td>图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...</td>\n",
       "      <td>[图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]</td>\n",
       "      <td>[借出, 回, 有]</td>\n",
       "      <td>28</td>\n",
       "      <td>[角有书, 图书]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8502</td>\n",
       "      <td>甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...</td>\n",
       "      <td>x=(230-35)/3-48</td>\n",
       "      <td>甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...</td>\n",
       "      <td>[甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...</td>\n",
       "      <td>[相距, 相向, 相距, 已知]</td>\n",
       "      <td>38</td>\n",
       "      <td>[小时, 相距, 已知]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23021</td>\n",
       "      <td>果园里有苹果树300棵，比桔树多20%，桔树有多少棵？</td>\n",
       "      <td>x=300/(1+20%)</td>\n",
       "      <td>果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？</td>\n",
       "      <td>[果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]</td>\n",
       "      <td>[有, 有]</td>\n",
       "      <td>17</td>\n",
       "      <td>[苹果树, 果园]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id                                      original_text  \\\n",
       "0    946              甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．   \n",
       "1  21227  客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...   \n",
       "2  16892          图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？   \n",
       "3   8502  甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...   \n",
       "4  23021                        果园里有苹果树300棵，比桔树多20%，桔树有多少棵？   \n",
       "\n",
       "                    equation  \\\n",
       "0           x=20/(4-1.5)*1.5   \n",
       "1  x=196/(80%+((3)/(3+2))-1)   \n",
       "2           x=30*(1-(1/5))+5   \n",
       "3            x=(230-35)/3-48   \n",
       "4              x=300/(1+20%)   \n",
       "\n",
       "                                      segmented_text  \\\n",
       "0  甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...   \n",
       "1  客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...   \n",
       "2  图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...   \n",
       "3  甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...   \n",
       "4        果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？   \n",
       "\n",
       "                                             content  \\\n",
       "0    [甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]   \n",
       "1  [客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...   \n",
       "2       [图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]   \n",
       "3  [甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...   \n",
       "4               [果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]   \n",
       "\n",
       "                                    verbs  word_count      keywords  \n",
       "0                                 [增加, 是]          24  [甲数, 商是, 乙数]  \n",
       "1  [相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]          63  [路程, 前进, 小时]  \n",
       "2                              [借出, 回, 有]          28     [角有书, 图书]  \n",
       "3                        [相距, 相向, 相距, 已知]          38  [小时, 相距, 已知]  \n",
       "4                                  [有, 有]          17     [苹果树, 果园]  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import jieba.analyse\n",
    "def get_keyword(text):\n",
    "    topk = min(3,len(text))\n",
    "    keyword = [word for word in jieba.analyse.textrank(text, topK = topk)]\n",
    "    return keyword\n",
    "\n",
    "data['keywords'] = data['original_text'].apply(get_keyword)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Topic Prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Classify problems by their topics may be helpful for models and analyse the result of models in different fields.\n",
    "Because there're no labels in original data, unsupervised algorithm LDA may be suitable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "keywords of topics\n",
      "(0, '0.067*\"米\" + 0.018*\"学校\" + 0.016*\"长\" + 0.013*\"学生\" + 0.013*\"分钟\"')\n",
      "(1, '0.060*\"千米\" + 0.057*\"小时\" + 0.026*\"棵\" + 0.024*\"行\" + 0.020*\"数\"')\n",
      "(2, '0.026*\"页\" + 0.025*\"修\" + 0.025*\"千克\" + 0.024*\"米\" + 0.014*\"天\"')\n",
      "(3, '0.041*\"千克\" + 0.031*\"吨\" + 0.022*\"生产\" + 0.017*\"计划\" + 0.015*\"苹果\"')\n",
      "(4, '0.117*\"元\" + 0.030*\"买\" + 0.019*\"钱\" + 0.014*\"原价\" + 0.011*\"女生\"')\n"
     ]
    }
   ],
   "source": [
    "from gensim import corpora, models\n",
    "\n",
    "all_words = []\n",
    "for text in data['content']:\n",
    "    all_words.append(text)\n",
    "#print(all_words)\n",
    "dictionary = corpora.Dictionary(all_words)\n",
    "corpus = [dictionary.doc2bow(text) for text in all_words]\n",
    "\n",
    "lda = models.ldamodel.LdaModel(corpus = corpus, id2word = dictionary, num_topics = 5)\n",
    "\n",
    "print('keywords of topics')\n",
    "for topic in lda.print_topics(num_words = 5):\n",
    "    print(topic)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>original_text</th>\n",
       "      <th>equation</th>\n",
       "      <th>segmented_text</th>\n",
       "      <th>content</th>\n",
       "      <th>verbs</th>\n",
       "      <th>word_count</th>\n",
       "      <th>keywords</th>\n",
       "      <th>topic</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>946</td>\n",
       "      <td>甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．</td>\n",
       "      <td>x=20/(4-1.5)*1.5</td>\n",
       "      <td>甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...</td>\n",
       "      <td>[甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]</td>\n",
       "      <td>[增加, 是]</td>\n",
       "      <td>24</td>\n",
       "      <td>[甲数, 商是, 乙数]</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21227</td>\n",
       "      <td>客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...</td>\n",
       "      <td>x=196/(80%+((3)/(3+2))-1)</td>\n",
       "      <td>客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...</td>\n",
       "      <td>[客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...</td>\n",
       "      <td>[相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]</td>\n",
       "      <td>63</td>\n",
       "      <td>[路程, 前进, 小时]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16892</td>\n",
       "      <td>图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？</td>\n",
       "      <td>x=30*(1-(1/5))+5</td>\n",
       "      <td>图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...</td>\n",
       "      <td>[图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]</td>\n",
       "      <td>[借出, 回, 有]</td>\n",
       "      <td>28</td>\n",
       "      <td>[角有书, 图书]</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8502</td>\n",
       "      <td>甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...</td>\n",
       "      <td>x=(230-35)/3-48</td>\n",
       "      <td>甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...</td>\n",
       "      <td>[甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...</td>\n",
       "      <td>[相距, 相向, 相距, 已知]</td>\n",
       "      <td>38</td>\n",
       "      <td>[小时, 相距, 已知]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23021</td>\n",
       "      <td>果园里有苹果树300棵，比桔树多20%，桔树有多少棵？</td>\n",
       "      <td>x=300/(1+20%)</td>\n",
       "      <td>果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？</td>\n",
       "      <td>[果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]</td>\n",
       "      <td>[有, 有]</td>\n",
       "      <td>17</td>\n",
       "      <td>[苹果树, 果园]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5901</td>\n",
       "      <td>某班学生参加数学兴趣小组，其中，参加的男生是全班人数的20%，参加的女生是全班人数的(2/7...</td>\n",
       "      <td>x=(5-2)/(20%+(2/7)+(3/5)-1)</td>\n",
       "      <td>某班 学生 参加 数学 兴趣小组 ， 其中 ， 参加 的 男生 是 全班 人数 的 20% ...</td>\n",
       "      <td>[某班, 学生, 参加, 数学, 兴趣小组, 参加, 男生, 全班, 人数, 20%, 参加...</td>\n",
       "      <td>[参加, 参加, 是, 参加, 是, 参加, 有]</td>\n",
       "      <td>55</td>\n",
       "      <td>[参加, 全班, 人数]</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>12815</td>\n",
       "      <td>某商店有36筐苹果，每筐重40千克，运来的苹果重量是梨的8倍，运来梨多少千克？</td>\n",
       "      <td>x=36*40/8</td>\n",
       "      <td>某 商店 有 36 筐 苹果 ， 每 筐 重 40 千克 ， 运 来 的 苹果 重量 是 梨...</td>\n",
       "      <td>[商店, 36, 筐, 苹果, 每筐, 重, 40, 千克, 运来, 苹果, 重量, 梨, ...</td>\n",
       "      <td>[有, 运来, 是, 运来]</td>\n",
       "      <td>27</td>\n",
       "      <td>[苹果, 重量, 运来]</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>19584</td>\n",
       "      <td>阳光小学举行少先队队仪式比赛，全校共有720名少先队员，平均分成15个中队，每个中队又分成4...</td>\n",
       "      <td>x=720/15/4</td>\n",
       "      <td>阳光 小学 举行 少先队 队 仪式 比赛 ， 全校 共有 720 名 少先队员 ， 平均 分...</td>\n",
       "      <td>[阳光, 小学, 少先队, 队, 仪式, 比赛, 全校, 共有, 720, 名, 少先队员,...</td>\n",
       "      <td>[举行, 共有, 分成, 分成, 分, 有]</td>\n",
       "      <td>36</td>\n",
       "      <td>[仪式, 比赛, 少先队员]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>10773</td>\n",
       "      <td>大猴有9只，小猴比大猴多4只，小猴有多少只．</td>\n",
       "      <td>x=9+4</td>\n",
       "      <td>大 猴 有 9 只 ， 小 猴 比 大 猴 多 4 只 ， 小 猴 有 多少 只 ．</td>\n",
       "      <td>[大猴, 小猴, 比大猴, 小猴]</td>\n",
       "      <td>[有, 有]</td>\n",
       "      <td>16</td>\n",
       "      <td>[]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>22037</td>\n",
       "      <td>一根电线，用去(3/5)米，用去的比剩下的少(1/4)米，这根电线原来有多长？</td>\n",
       "      <td>x=(3/5)+(1/4)+(3/5)</td>\n",
       "      <td>一 根 电线 ， 用 去 (3/5) 米 ， 用 去 的 比 剩下 的 少 (1/4) 米 ...</td>\n",
       "      <td>[一根, 电线, 米, 剩下, 少, 米, 这根, 电线, 多长]</td>\n",
       "      <td>[去, 用去, 剩下, 有]</td>\n",
       "      <td>32</td>\n",
       "      <td>[用去, 电线, 剩下]</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id                                      original_text  \\\n",
       "0    946              甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．   \n",
       "1  21227  客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...   \n",
       "2  16892          图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？   \n",
       "3   8502  甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...   \n",
       "4  23021                        果园里有苹果树300棵，比桔树多20%，桔树有多少棵？   \n",
       "5   5901  某班学生参加数学兴趣小组，其中，参加的男生是全班人数的20%，参加的女生是全班人数的(2/7...   \n",
       "6  12815            某商店有36筐苹果，每筐重40千克，运来的苹果重量是梨的8倍，运来梨多少千克？   \n",
       "7  19584  阳光小学举行少先队队仪式比赛，全校共有720名少先队员，平均分成15个中队，每个中队又分成4...   \n",
       "8  10773                             大猴有9只，小猴比大猴多4只，小猴有多少只．   \n",
       "9  22037            一根电线，用去(3/5)米，用去的比剩下的少(1/4)米，这根电线原来有多长？   \n",
       "\n",
       "                      equation  \\\n",
       "0             x=20/(4-1.5)*1.5   \n",
       "1    x=196/(80%+((3)/(3+2))-1)   \n",
       "2             x=30*(1-(1/5))+5   \n",
       "3              x=(230-35)/3-48   \n",
       "4                x=300/(1+20%)   \n",
       "5  x=(5-2)/(20%+(2/7)+(3/5)-1)   \n",
       "6                    x=36*40/8   \n",
       "7                   x=720/15/4   \n",
       "8                        x=9+4   \n",
       "9          x=(3/5)+(1/4)+(3/5)   \n",
       "\n",
       "                                      segmented_text  \\\n",
       "0  甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...   \n",
       "1  客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...   \n",
       "2  图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...   \n",
       "3  甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...   \n",
       "4        果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？   \n",
       "5  某班 学生 参加 数学 兴趣小组 ， 其中 ， 参加 的 男生 是 全班 人数 的 20% ...   \n",
       "6  某 商店 有 36 筐 苹果 ， 每 筐 重 40 千克 ， 运 来 的 苹果 重量 是 梨...   \n",
       "7  阳光 小学 举行 少先队 队 仪式 比赛 ， 全校 共有 720 名 少先队员 ， 平均 分...   \n",
       "8         大 猴 有 9 只 ， 小 猴 比 大 猴 多 4 只 ， 小 猴 有 多少 只 ．   \n",
       "9  一 根 电线 ， 用 去 (3/5) 米 ， 用 去 的 比 剩下 的 少 (1/4) 米 ...   \n",
       "\n",
       "                                             content  \\\n",
       "0    [甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]   \n",
       "1  [客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...   \n",
       "2       [图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]   \n",
       "3  [甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...   \n",
       "4               [果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]   \n",
       "5  [某班, 学生, 参加, 数学, 兴趣小组, 参加, 男生, 全班, 人数, 20%, 参加...   \n",
       "6  [商店, 36, 筐, 苹果, 每筐, 重, 40, 千克, 运来, 苹果, 重量, 梨, ...   \n",
       "7  [阳光, 小学, 少先队, 队, 仪式, 比赛, 全校, 共有, 720, 名, 少先队员,...   \n",
       "8                                  [大猴, 小猴, 比大猴, 小猴]   \n",
       "9                  [一根, 电线, 米, 剩下, 少, 米, 这根, 电线, 多长]   \n",
       "\n",
       "                                    verbs  word_count        keywords  topic  \n",
       "0                                 [增加, 是]          24    [甲数, 商是, 乙数]      3  \n",
       "1  [相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]          63    [路程, 前进, 小时]      1  \n",
       "2                              [借出, 回, 有]          28       [角有书, 图书]      3  \n",
       "3                        [相距, 相向, 相距, 已知]          38    [小时, 相距, 已知]      1  \n",
       "4                                  [有, 有]          17       [苹果树, 果园]      1  \n",
       "5               [参加, 参加, 是, 参加, 是, 参加, 有]          55    [参加, 全班, 人数]      0  \n",
       "6                          [有, 运来, 是, 运来]          27    [苹果, 重量, 运来]      3  \n",
       "7                  [举行, 共有, 分成, 分成, 分, 有]          36  [仪式, 比赛, 少先队员]      2  \n",
       "8                                  [有, 有]          16              []      2  \n",
       "9                          [去, 用去, 剩下, 有]          32    [用去, 电线, 剩下]      0  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "topic = []\n",
    "for i,values in enumerate(lda.inference(corpus)[0]):\n",
    "    topic_val = 0\n",
    "    topic_id = 0\n",
    "    for tid, val in enumerate(values):\n",
    "        if val > topic_val:\n",
    "            topic_val = val\n",
    "            topic_id = tid\n",
    "    topic.append(topic_id)\n",
    "data['topic'] = topic\n",
    "data.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"450\" style=\"\" viewBox=\"0 0 700 450\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"450\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-ea732a\"><g class=\"clips\"/><g class=\"gradients\"/></defs><g class=\"bglayer\"/><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"/><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"><g class=\"trace\" stroke-linejoin=\"round\" style=\"opacity: 1;\"><g class=\"slice\"><path class=\"surface\" d=\"M350,235l0,-135a135,135 0 0 1 134.10589851916407,119.48845648016716Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(403.93080881744174,179.26852115413095)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(255, 255, 255); fill-opacity: 1; white-space: pre;\">23.2%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l-130.95658245977242,-32.79288810789351a135,135 0 0 1 130.95658245977242,-102.2071118921065Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(298.57787637510444,173.91040733570503)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">21.1%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l-77.79654903661269,110.32994588049935a135,135 0 0 1 -53.16003342315973,-143.12283398839287Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(269.16007646472406,269.8232021576181)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">19.1%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l72.65852117165534,113.77934479134652a135,135 0 0 1 -150.455070208268,-3.449398910847165Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(348.0131751181988,326.4577352858047)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">18.8%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l134.10589851916407,-15.511543519832953a135,135 0 0 1 -61.447377347508734,129.29088831117946Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(429.6831444907214,277.66744973047037)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">17.8%</text></g></g></g></g><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-ea732a\"><g class=\"clips\"/><clipPath id=\"legendea732a\"><rect width=\"53\" height=\"105\" x=\"0\" y=\"0\"/></clipPath></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"legend\" pointer-events=\"all\" transform=\"translate(630.8,100)\"><rect class=\"bg\" shape-rendering=\"crispEdges\" style=\"stroke: rgb(68, 68, 68); stroke-opacity: 1; fill: rgb(255, 255, 255); fill-opacity: 1; stroke-width: 0px;\" width=\"53\" height=\"105\" x=\"0\" y=\"0\"/><g class=\"scrollbox\" transform=\"\" clip-path=\"url('#legendea732a')\"><g class=\"groups\"><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,14.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">3</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"48\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,33.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"48\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,52.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">0</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"48\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,71.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">4</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"48\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,90.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"48\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g></g></g><rect class=\"scrollbar\" rx=\"20\" ry=\"3\" width=\"0\" height=\"0\" style=\"fill: rgb(128, 139, 164); fill-opacity: 1;\" x=\"0\" y=\"0\"/></g><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">Topic of problems</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "output = data['topic'].value_counts().reset_index()\n",
    "output.columns=['topic_id','number of problems']\n",
    "\n",
    "fig = px.pie(\n",
    "    output,\n",
    "    names = 'topic_id',\n",
    "    values = 'number of problems',\n",
    "    title = 'Topic of problems'    \n",
    ")\n",
    "\n",
    "fig.show(\"svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Number of operators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you know how many operators are there in equations, it may be much easier for you to solve math word problems.Especially when your algorithm are based on equation templates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"450\" style=\"\" viewBox=\"0 0 700 450\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"450\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-492aa1\"><g class=\"clips\"/><g class=\"gradients\"/></defs><g class=\"bglayer\"/><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"/><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"><g class=\"trace\" stroke-linejoin=\"round\" style=\"opacity: 1;\"><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l0,-135a135,135 0 0 1 132.7464701138855,159.5636860487855Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(388.17491313413376,190.9832144756767)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(255, 255, 255); fill-opacity: 1; white-space: pre;\">27.9%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l-122.38205775499387,-56.98799820710802a135,135 0 0 1 122.38205775499387,-78.01200179289198Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(282.3034389641329,165.756828266385)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">18.1%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l-124.14323920666564,53.04202257905135a135,135 0 0 1 1.7611814516717743,-110.03002078615937Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(233.60236562865293,238.26190178353338)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">13.4%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l-58.62614865127458,121.60581686053777a135,135 0 0 1 -65.51709055539106,-68.56379428148642Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(257.2674179826704,308.8197268808352)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">11.4%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l27.77905238331153,132.1110300038768a135,135 0 0 1 -86.40520103458611,-10.505213143339034Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(317.178672685883,341.1395844779375)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">10.4%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l90.3262814713609,100.33026898975446a135,135 0 0 1 -62.54722908804937,31.780761014122348Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(25, 211, 243); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(378.04030077752384,335.3282976549036)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">8.37%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l109.25387622990823,79.30063384828581a135,135 0 0 1 -18.92759475854733,21.029635141468646Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 102, 146); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(407.87956527552996,311.9987254997095)rotate(41.98860202055096)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">3.34%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l121.37427466226765,59.10402228451472a135,135 0 0 1 -12.120398432359423,20.196611563771093Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(182, 232, 128); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(423.6236918915859,297.0799134163091)rotate(30.968828253173285)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">2.78%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l127.84561898798172,43.3647057591748a135,135 0 0 1 -6.471344325714071,15.739316525339923Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 151, 255); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(478.0535135479553,297.7916692550556)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2.01%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M454.3582010479553,286.3364357067024V292.9947942550556h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l130.87613032964182,33.11251289072772a135,135 0 0 1 -3.0305113416601017,10.252192868447075Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(254, 203, 82); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(482.65771150892374,283.3854192550556)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1.26%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M458.96239900892374,273.26861955772944V278.5885442550556h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M329.5,235l132.74647011388552,24.56368604878546a135,135 0 0 1 -1.870339784243697,8.548826841942262Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(198, 202, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(485.0759004432045,268.9791692550556)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1.03%</text></g></g></g></g><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-492aa1\"><g class=\"clips\"/><clipPath id=\"legend492aa1\"><rect width=\"78\" height=\"219\" x=\"0\" y=\"0\"/></clipPath></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"legend\" pointer-events=\"all\" transform=\"translate(568.1800000000001,100)\"><rect class=\"bg\" shape-rendering=\"crispEdges\" style=\"stroke: rgb(68, 68, 68); stroke-opacity: 1; fill: rgb(255, 255, 255); fill-opacity: 1; stroke-width: 0px;\" width=\"78\" height=\"219\" x=\"0\" y=\"0\"/><g class=\"scrollbox\" transform=\"\" clip-path=\"url('#legend492aa1')\"><g class=\"groups\"><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,14.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">3</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,33.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,52.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,71.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">5</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,90.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">4</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,109.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">6</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(25, 211, 243); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,128.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">7</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 102, 146); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,147.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">8</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(182, 232, 128); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,166.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">other</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 151, 255); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,185.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">9</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(254, 203, 82); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,204.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">10</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(198, 202, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g></g></g><rect class=\"scrollbar\" rx=\"20\" ry=\"3\" width=\"0\" height=\"0\" style=\"fill: rgb(128, 139, 164); fill-opacity: 1;\" x=\"0\" y=\"0\"/></g><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">Number of operators</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def num_of_operators(equation):\n",
    "    cnt = 0\n",
    "    for op in equation:\n",
    "        if op =='(' or op == '+' or op == '-' or op == '*' or op == '/' or op == '^':\n",
    "            cnt += 1\n",
    "    return cnt\n",
    "\n",
    "tmp = data.loc[:,['equation']]\n",
    "tmp['operators_cnt'] = tmp['equation'].apply(num_of_operators)\n",
    "cnt = tmp['operators_cnt'].value_counts().reset_index()\n",
    "output = cnt.head(10)\n",
    "other_sum = cnt['operators_cnt'].sum() - output['operators_cnt'].sum()\n",
    "\n",
    "output = output.sort_values(['operators_cnt'])\n",
    "output.loc[10] = ['other', other_sum]\n",
    "\n",
    "output.columns=['number of operators','number of problems']\n",
    "\n",
    "fig = px.pie(\n",
    "    output,\n",
    "    names = 'number of operators',\n",
    "    values = 'number of problems',\n",
    "    title = 'Number of operators'    \n",
    ")\n",
    "\n",
    "fig.show(\"svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate difficulty"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Different problems have different difficulty.People may choose different way to solve problems when difficulty of problems are different,and so is AI.To evaluate difficulty of problems, the kinds of operators in equations may be useful.Value of them are as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"500\" style=\"\" viewBox=\"0 0 700 500\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"500\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-33f568\"><g class=\"clips\"><clipPath id=\"clip33f568xyplot\" class=\"plotclip\"><rect width=\"540\" height=\"320\"/></clipPath><clipPath class=\"axesclip\" id=\"clip33f568x\"><rect x=\"80\" y=\"0\" width=\"540\" height=\"500\"/></clipPath><clipPath class=\"axesclip\" id=\"clip33f568y\"><rect x=\"0\" y=\"100\" width=\"700\" height=\"320\"/></clipPath><clipPath class=\"axesclip\" id=\"clip33f568xy\"><rect x=\"80\" y=\"100\" width=\"540\" height=\"320\"/></clipPath></g><g class=\"gradients\"/><g class=\"patterns\"/></defs><g class=\"bglayer\"><rect class=\"bg\" x=\"80\" y=\"100\" width=\"540\" height=\"320\" style=\"fill: rgb(229, 236, 246); fill-opacity: 1; stroke-width: 0;\"/></g><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"><g class=\"subplot xy\"><g class=\"layer-subplot\"><g class=\"shapelayer\"/><g class=\"imagelayer\"/></g><g class=\"gridlayer\"><g class=\"x\"/><g class=\"y\"><path class=\"ygrid crisp\" transform=\"translate(0,340.75)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"ygrid crisp\" transform=\"translate(0,261.5)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"ygrid crisp\" transform=\"translate(0,182.25)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"ygrid crisp\" transform=\"translate(0,103)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/></g></g><g class=\"zerolinelayer\"><path class=\"yzl zl crisp\" transform=\"translate(0,420)\" d=\"M80,0h540\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 2px;\"/></g><path class=\"xlines-below\"/><path class=\"ylines-below\"/><g class=\"overlines-below\"/><g class=\"xaxislayer-below\"/><g class=\"yaxislayer-below\"/><g class=\"overaxes-below\"/><g class=\"plot\" transform=\"translate(80,100)\" clip-path=\"url(#clip33f568xyplot)\"><g class=\"barlayer mlayer\"><g class=\"trace bars\" style=\"opacity: 1;\"><g class=\"points\"><g class=\"point\"><path d=\"M13.27,320V16H14.76V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M33.82,320V72.43H35.31V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M9.53,320V100.32H11.02V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M41.29,320V106.19H42.79V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M22.61,320V121.4H24.1V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M31.95,320V150.41H33.45V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M43.16,320V160.23H44.66V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M18.87,320V167.68H20.37V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M37.56,320V188.6H39.05V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M28.21,320V191.93H29.71V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M61.85,320V203.35H63.34V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M59.98,320V224.27H61.47V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M15.13,320V232.03H16.63V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M58.11,320V238.85H59.61V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M78.66,320V252.32H80.16V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M30.08,320V253.91H31.58V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M39.43,320V262.47H40.92V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M26.35,320V263.73H27.84V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M46.9,320V266.43H48.39V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M56.24,320V268.33H57.74V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M69.32,320V271.97H70.82V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M67.45,320V272.61H68.95V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M3.92,320V273.56H5.42V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M74.93,320V274.19H76.42V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M5.79,320V275.3H7.29V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M50.64,320V279.74H52.13V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M71.19,320V280.53H72.69V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M24.48,320V281.48H25.97V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M65.58,320V283.55H67.08V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M17,320V284.02H18.5V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M45.03,320V286.08H46.53V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M52.51,320V288.46H54V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M11.4,320V290.99H12.89V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M7.66,320V293.85H9.16V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M63.72,320V294.64H65.21V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M89.88,320V295.43H91.37V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M82.4,320V295.43H83.9V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M95.48,320V296.86H96.98V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M76.8,320V297.97H78.29V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M80.53,320V298.6H82.03V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M48.77,320V300.35H50.26V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M91.74,320V300.5H93.24V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M88.01,320V302.88H89.5V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M54.37,320V303.67H55.87V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M84.27,320V304.47H85.76V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M35.69,320V305.26H37.18V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M93.61,320V305.58H95.11V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M99.22,320V305.89H100.71V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M86.14,320V306.84H87.63V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M123.51,320V308.91H125V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M114.17,320V308.91H115.66V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M117.9,320V309.22H119.4V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M97.35,320V309.54H98.84V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M73.06,320V310.01H74.55V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M106.69,320V312.39H108.19V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M108.56,320V313.5H110.06V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M101.09,320V313.66H102.58V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M104.82,320V313.82H106.32V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M116.03,320V313.82H117.53V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M119.77,320V314.29H121.27V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M157.14,320V314.45H158.64V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M125.38,320V314.45H126.87V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M20.74,320V314.93H22.24V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M112.3,320V315.25H113.79V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M102.96,320V315.4H104.45V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M132.85,320V315.56H134.35V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M127.25,320V315.72H128.74V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M110.43,320V316.04H111.92V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0.19,320V316.35H1.68V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M129.11,320V316.51H130.61V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M138.46,320V316.99H139.95V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M134.72,320V317.31H136.21V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M121.64,320V317.46H123.13V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M144.06,320V317.46H145.56V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M136.59,320V317.62H138.08V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M130.98,320V318.1H132.48V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M155.27,320V318.1H156.77V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M228.15,320V318.1H229.64V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M140.33,320V318.1H141.82V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M142.19,320V318.26H143.69V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M160.88,320V318.42H162.37V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M166.48,320V318.57H167.98V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M149.67,320V318.57H151.16V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M164.62,320V318.73H166.11V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M153.4,320V318.73H154.9V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M145.93,320V318.89H147.43V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M185.17,320V318.89H186.66V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M175.83,320V319.05H177.32V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M147.8,320V319.05H149.29V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M172.09,320V319.05H173.58V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M170.22,320V319.05H171.72V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M183.3,320V319.21H184.8V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M188.91,320V319.21H190.4V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M151.54,320V319.37H153.03V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M192.64,320V319.37H194.14V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M173.96,320V319.37H175.45V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M168.35,320V319.37H169.85V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M194.51,320V319.52H196.01V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M239.36,320V319.52H240.85V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M179.56,320V319.52H181.06V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M187.04,320V319.52H188.53V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M218.8,320V319.68H220.3V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M230.01,320V319.68H231.51V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M198.25,320V319.68H199.74V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M256.17,320V319.68H257.67V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M254.3,320V319.84H255.8V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M190.78,320V319.84H192.27V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M162.75,320V319.84H164.24V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M159.01,320V319.84H160.51V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M205.72,320V319.84H207.22V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M207.59,320V319.84H209.09V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M215.07,320V319.84H216.56V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M241.22,320V319.84H242.72V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M231.88,320V319.84H233.38V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M276.73,320V319.84H278.22V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M308.49,320V319.84H309.99V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M314.1,320V319.84H315.59V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M263.65,320V319.84H265.14V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M538.32,320V319.84H539.81V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M201.99,320V319.84H203.48V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M246.83,320V319.84H248.33V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M181.43,320V319.84H182.93V320Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g></g></g></g></g><g class=\"overplot\"/><path class=\"xlines-above crisp\" d=\"M0,0\" style=\"fill: none;\"/><path class=\"ylines-above crisp\" d=\"M0,0\" style=\"fill: none;\"/><g class=\"overlines-above\"/><g class=\"xaxislayer-above\"><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" transform=\"translate(80.93,0)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">0</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(174.36,0)\">50</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(267.78999999999996,0)\">100</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(361.21,0)\">150</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(454.64,0)\">200</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(548.06,0)\">250</text></g></g><g class=\"yaxislayer-above\"><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,420)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">0</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,340.75)\">500</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,261.5)\">1000</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,182.25)\">1500</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(0,103)\">2000</text></g></g><g class=\"overaxes-above\"/></g></g><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"/><g class=\"iciclelayer\"/><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-33f568\"><g class=\"clips\"/></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">The difficulty of problems</text></g><g class=\"g-xtitle\"><text class=\"xtitle\" x=\"350\" y=\"460.8\" text-anchor=\"middle\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 14px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">difficulty</text></g><g class=\"g-ytitle\"><text class=\"ytitle\" transform=\"rotate(-90,27.496875000000003,260)\" x=\"27.496875000000003\" y=\"260\" text-anchor=\"middle\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 14px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">problem_count</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def calc_difficulty(equation):\n",
    "    difficulty = 0\n",
    "\n",
    "    def eval(x):\n",
    "        if x == '+' : return 2\n",
    "        elif x == '-' : return 3\n",
    "        elif x == '*' : return 5\n",
    "        elif x == '/' : return 7\n",
    "        elif x == '(' : return 8\n",
    "        elif x == '%' : return 5\n",
    "        elif x == '^' : return 6\n",
    "        else : return 0\n",
    "\n",
    "    for op in equation:\n",
    "        difficulty += eval(op)\n",
    "    return difficulty\n",
    "\n",
    "\n",
    "data['difficulty'] = data['equation'].apply(calc_difficulty)\n",
    "\n",
    "cnt = data['difficulty'].value_counts().reset_index()\n",
    "cnt.columns = [ 'difficulty' , 'problem_count']\n",
    "\n",
    "fig = px.bar(\n",
    "    cnt , x = 'difficulty', y = 'problem_count' ,\n",
    "    title = 'The difficulty of problems'\n",
    ")\n",
    "fig.show('svg')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The most difficult problems"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"500\" style=\"\" viewBox=\"0 0 700 500\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"500\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-2133a1\"><g class=\"clips\"><clipPath id=\"clip2133a1xyplot\" class=\"plotclip\"><rect width=\"540\" height=\"320\"/></clipPath><clipPath class=\"axesclip\" id=\"clip2133a1x\"><rect x=\"80\" y=\"0\" width=\"540\" height=\"500\"/></clipPath><clipPath class=\"axesclip\" id=\"clip2133a1y\"><rect x=\"0\" y=\"100\" width=\"700\" height=\"320\"/></clipPath><clipPath class=\"axesclip\" id=\"clip2133a1xy\"><rect x=\"80\" y=\"100\" width=\"540\" height=\"320\"/></clipPath></g><g class=\"gradients\"/><g class=\"patterns\"/></defs><g class=\"bglayer\"><rect class=\"bg\" x=\"80\" y=\"100\" width=\"540\" height=\"320\" style=\"fill: rgb(229, 236, 246); fill-opacity: 1; stroke-width: 0;\"/></g><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"><g class=\"subplot xy\"><g class=\"layer-subplot\"><g class=\"shapelayer\"/><g class=\"imagelayer\"/></g><g class=\"gridlayer\"><g class=\"x\"><path class=\"xgrid crisp\" transform=\"translate(169.06,0)\" d=\"M0,100v320\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"xgrid crisp\" transform=\"translate(258.13,0)\" d=\"M0,100v320\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"xgrid crisp\" transform=\"translate(347.19,0)\" d=\"M0,100v320\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"xgrid crisp\" transform=\"translate(436.25,0)\" d=\"M0,100v320\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"xgrid crisp\" transform=\"translate(525.31,0)\" d=\"M0,100v320\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/><path class=\"xgrid crisp\" transform=\"translate(614.38,0)\" d=\"M0,100v320\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 1px;\"/></g><g class=\"y\"/></g><g class=\"zerolinelayer\"><path class=\"xzl zl crisp\" transform=\"translate(80,0)\" d=\"M0,100v320\" style=\"stroke: rgb(255, 255, 255); stroke-opacity: 1; stroke-width: 2px;\"/></g><path class=\"xlines-below\"/><path class=\"ylines-below\"/><g class=\"overlines-below\"/><g class=\"xaxislayer-below\"/><g class=\"yaxislayer-below\"/><g class=\"overaxes-below\"/><g class=\"plot\" transform=\"translate(80,100)\" clip-path=\"url(#clip2133a1xyplot)\"><g class=\"barlayer mlayer\"><g class=\"trace bars\" style=\"opacity: 1;\"><g class=\"points\"><g class=\"point\"><path d=\"M0,316.8V291.2H229.78V316.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,284.8V259.2H235.13V284.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,252.8V227.2H242.25V252.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,220.8V195.2H244.03V220.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,188.8V163.2H244.03V188.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,156.8V131.2H251.16V156.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,124.8V99.2H263.63V124.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,92.8V67.2H293.91V92.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,60.8V35.2H299.25V60.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g><g class=\"point\"><path d=\"M0,28.8V3.2H513V28.8Z\" style=\"vector-effect: non-scaling-stroke; opacity: 1; stroke-width: 0.5px; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(229, 236, 246); stroke-opacity: 1;\"/></g></g></g></g></g><g class=\"overplot\"/><path class=\"xlines-above crisp\" d=\"M0,0\" style=\"fill: none;\"/><path class=\"ylines-above crisp\" d=\"M0,0\" style=\"fill: none;\"/><g class=\"overlines-above\"/><g class=\"xaxislayer-above\"><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" transform=\"translate(80,0)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">0</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(169.06,0)\">50</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(258.13,0)\">100</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(347.19,0)\">150</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(436.25,0)\">200</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(525.31,0)\">250</text></g><g class=\"xtick\"><text text-anchor=\"middle\" x=\"0\" y=\"433\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\" transform=\"translate(614.38,0)\">300</text></g></g><g class=\"yaxislayer-above\"><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,404)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">7157</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,372)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">4258</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,340)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">7576</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,308)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">14401</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,276)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">1618</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,244)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">11036</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,212)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">2861</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,180)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">12798</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,148)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">18755</text></g><g class=\"ytick\"><text text-anchor=\"end\" x=\"79\" y=\"4.199999999999999\" transform=\"translate(0,116)\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre; opacity: 1;\">17520</text></g></g><g class=\"overaxes-above\"/></g></g><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"/><g class=\"iciclelayer\"/><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-2133a1\"><g class=\"clips\"/></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">The difficulty of problems</text></g><g class=\"g-xtitle\"><text class=\"xtitle\" x=\"350\" y=\"460.8\" text-anchor=\"middle\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 14px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">difficulty</text></g><g class=\"g-ytitle\"><text class=\"ytitle\" transform=\"rotate(-90,20.825000000000003,260)\" x=\"20.825000000000003\" y=\"260\" text-anchor=\"middle\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 14px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">id</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "tmp = data[['id','original_text','difficulty']]\n",
    "tmp = tmp.sort_values(['difficulty']).tail(10)\n",
    "tmp ['id'] = tmp['id'] . astype(str)\n",
    "fig = px.bar(\n",
    "    tmp , x = 'difficulty', y = 'id' ,\n",
    "    orientation = 'h',\n",
    "    title = 'The difficulty of problems'\n",
    ")\n",
    "fig.show('svg')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simplify expressions\n",
    "\n",
    "Algorithm based on templates will find templates in equations at first.To find templates, we should simplify expressions first.'+' means operator '+' or '-', '\\*' means operator '\\*' or '/', 'n' means a number.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pythonds.basic.stack import Stack\n",
    "\n",
    "\n",
    "def simplify(expr): \n",
    "    n = len(expr)\n",
    "    output = ''\n",
    "    flag = True\n",
    "    for i in range(2,n):\n",
    "        if flag and (expr[i].isdigit() or expr[i] == '.' or expr[i] == '%'):\n",
    "            output = output + 'n'\n",
    "            flag = False\n",
    "        if not (expr[i].isdigit() or expr[i] == '.' or expr[i] == '%'):\n",
    "            if expr[i] == '[' or expr[i] == '{':\n",
    "                output = output + '('\n",
    "            elif expr[i] == ']' or expr[i] == '}':\n",
    "                output = output + ')'\n",
    "            elif expr[i] == '-':\n",
    "                output = output + '+'\n",
    "            elif expr[i] == '/':\n",
    "                output = output + '*'\n",
    "            else: output = output + expr[i]\n",
    "            flag = True\n",
    "    return output\n",
    "    \n",
    "data['post_expression'] = data['equation'].apply(simplify)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Count of numbers in equations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"450\" style=\"\" viewBox=\"0 0 700 450\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"450\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-be3b8c\"><g class=\"clips\"/><g class=\"gradients\"/></defs><g class=\"bglayer\"/><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"/><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"><g class=\"trace\" stroke-linejoin=\"round\" style=\"opacity: 1;\"><g class=\"slice\"><path class=\"surface\" d=\"M350,235l0,-135a135,135 0 0 1 69.32581698084607,250.84011006528885Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(416.25803497266634,221.48484186929377)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(255, 255, 255); fill-opacity: 1; white-space: pre;\">41.4%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l-134.78412232434567,-7.631537804127246a135,135 0 0 1 134.78412232434567,-127.36846219587275Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(295.0317895235523,181.62829953730605)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">24.1%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l-94.77724681005951,96.13674368889903a135,135 0 0 1 -40.006875514286165,-103.76828149302628Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(260.78513995358634,274.1928170792875)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">13.5%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l-6.662427847338052,134.83549998119565a135,135 0 0 1 -88.11481896272146,-38.69875629229662Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(309.97948252387295,330.921267359486)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">11.6%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l37.372165642786584,129.7240194997369a135,135 0 0 1 -44.034593490124635,5.111480481458756Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(357.01006694128625,336.99240788723887)rotate(83.37881012002413)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">5.25%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l53.32813152739498,124.02060469049835a135,135 0 0 1 -15.955965884608396,5.703414809238552Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(25, 211, 243); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(409.32253313967135,433.2889428266128)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M395.43972063967135,362.12290032951825V428.4920678266128h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l60.69574059621531,120.58618110495473a135,135 0 0 1 -7.367609068820329,3.4344235855436125Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 102, 146); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(434.5456040934391,418.8826928266128)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">0.958%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M407.0377915934391,357.35885881350566V414.0858178266128h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l64.10537091230456,118.8086756941422a135,135 0 0 1 -3.409630316089256,1.7775054108125374Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(182, 232, 128); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(439.9146970546056,404.4764428266128)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">0.453%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M412.4068845546056,354.7095683735771V399.6795678266128h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l66.28593823797378,117.60601341730586a135,135 0 0 1 -2.180567325669216,1.2026622768363353Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 151, 255); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(442.70624020051207,390.0701928266128)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">0.294%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M415.19842770051207,353.21237255626465V385.2733178266128h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l68.34923024898204,116.41899640682202a135,135 0 0 1 -2.063292011008258,1.1870170104838422Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(254, 203, 82); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(444.82801303983456,375.6639428266128)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">0.281%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M417.32020053983456,352.0170526003644V370.8670678266128h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M350,235l69.32581698084604,115.84011006528885a135,135 0 0 1 -0.9765867318640034,0.5788863415331633Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(198, 202, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(446.34594462735333,361.2576928266128)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">0.134%</text></g></g></g></g><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-be3b8c\"><g class=\"clips\"/><clipPath id=\"legendbe3b8c\"><rect width=\"78\" height=\"219\" x=\"0\" y=\"0\"/></clipPath></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"legend\" pointer-events=\"all\" transform=\"translate(599.6,100)\"><rect class=\"bg\" shape-rendering=\"crispEdges\" width=\"78\" height=\"219\" x=\"0\" y=\"0\" style=\"stroke: rgb(68, 68, 68); stroke-opacity: 1; fill: rgb(255, 255, 255); fill-opacity: 1; stroke-width: 0px;\"/><g class=\"scrollbox\" transform=\"\" clip-path=\"url('#legendbe3b8c')\"><g class=\"groups\"><g class=\"traces\" transform=\"translate(0,14.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">3</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,33.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">4</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,52.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,71.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">5</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,90.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">6</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,109.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">7</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(25, 211, 243); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,128.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">8</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 102, 146); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,147.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">9</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(182, 232, 128); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,166.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">10</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 151, 255); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,185.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">other</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(254, 203, 82); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" transform=\"translate(0,204.5)\" style=\"opacity: 1;\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(198, 202, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"72.359375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g></g></g><rect class=\"scrollbar\" rx=\"20\" ry=\"3\" width=\"0\" height=\"0\" x=\"0\" y=\"0\" style=\"fill: rgb(128, 139, 164); fill-opacity: 1;\"/></g><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">Count of numbers in equations</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def CountNum(expr):\n",
    "    cnt = 0\n",
    "    for x in expr:\n",
    "        if x == 'n':\n",
    "            cnt = cnt + 1\n",
    "    return cnt\n",
    "\n",
    "\n",
    "tmp = data.loc[:,['post_expression','original_text']]\n",
    "tmp['number_cnt'] = tmp['post_expression'].apply(CountNum)\n",
    "\n",
    "\n",
    "cnt = tmp['number_cnt'].value_counts().reset_index()\n",
    "output = cnt.head(10)\n",
    "other_sum = cnt['number_cnt'].sum() - output['number_cnt'].sum()\n",
    "\n",
    "output = output.sort_values(['number_cnt'])\n",
    "output.loc[10] = ['other', other_sum]\n",
    "\n",
    "output.columns=['count of numbers','number of problems']\n",
    "\n",
    "fig = px.pie(\n",
    "    output,\n",
    "    names = 'count of numbers',\n",
    "    values = 'number of problems',\n",
    "    title = 'Count of numbers in equations'    \n",
    ")\n",
    "\n",
    "fig.show(\"svg\")    \n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Are numbers in equations as many as in problems?\n",
    "This result shows that about half of problems have useless parameters or potential parameters in problems"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"450\" style=\"\" viewBox=\"0 0 700 450\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"450\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-c0ce39\"><g class=\"clips\"/><g class=\"gradients\"/></defs><g class=\"bglayer\"/><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"/><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"><g class=\"trace\" stroke-linejoin=\"round\" style=\"opacity: 1;\"><g class=\"slice\"><path class=\"surface\" d=\"M342,235l0,-135a135,135 0 1 1 -32.91720996610151,265.9253882486036Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(408.9887373122914,248.0889840987043)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(255, 255, 255); fill-opacity: 1; white-space: pre;\">53.9%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M342,235l-32.91720996610148,130.9253882486036a135,135 0 0 1 32.91720996610148,-265.9253882486036Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(274.75660313354115,231.4732432171714)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">46.1%</text></g></g></g></g><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-c0ce39\"><g class=\"clips\"/><clipPath id=\"legendc0ce39\"><rect width=\"74\" height=\"48\" x=\"0\" y=\"0\"/></clipPath></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"legend\" pointer-events=\"all\" transform=\"translate(614.48,100)\"><rect class=\"bg\" shape-rendering=\"crispEdges\" style=\"stroke: rgb(68, 68, 68); stroke-opacity: 1; fill: rgb(255, 255, 255); fill-opacity: 1; stroke-width: 0px;\" width=\"74\" height=\"48\" x=\"0\" y=\"0\"/><g class=\"scrollbox\" transform=\"\" clip-path=\"url('#legendc0ce39')\"><g class=\"groups\"><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,14.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">true</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"68.171875\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,33.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">false</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"68.171875\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g></g></g><rect class=\"scrollbar\" rx=\"20\" ry=\"3\" width=\"0\" height=\"0\" style=\"fill: rgb(128, 139, 164); fill-opacity: 1;\" x=\"0\" y=\"0\"/></g><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">Are numbers in equation as many as in problems?</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def NuminProb(text):\n",
    "    prob = str(text)\n",
    "    cnt = 0\n",
    "    flag = True\n",
    "    \n",
    "    for w in prob:\n",
    "        if w.isdigit() or w == '.' or w == '%':\n",
    "            if flag:\n",
    "                cnt += 1\n",
    "                flag = False\n",
    "        else:\n",
    "            flag =True\n",
    "    return cnt\n",
    "\n",
    "def isSame(a, b):\n",
    "    if a == b:\n",
    "        return True\n",
    "    else:\n",
    "        return False\n",
    "\n",
    "tmp['num_in_prob'] = tmp['original_text'].apply(NuminProb)\n",
    "tmp['same count'] = tmp.apply(lambda row: isSame(row['number_cnt'], row['num_in_prob']), axis=1)\n",
    "same = tmp['same count'].value_counts().reset_index()\n",
    "\n",
    "fig = px.pie(\n",
    "    same,\n",
    "    names = 'index',\n",
    "    values = 'same count',\n",
    "    title = 'Are numbers in equation as many as in problems?'    \n",
    ")\n",
    "fig.show(\"svg\")   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Postfix expressions\n",
    "Some algorithm need postfix expressions instead of infix expressions.The reasons for that may be postfix expressions can help us build expression trees,and there are no brackets in postfix expressions,so postfix expressions can merge some template."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>original_text</th>\n",
       "      <th>equation</th>\n",
       "      <th>segmented_text</th>\n",
       "      <th>content</th>\n",
       "      <th>verbs</th>\n",
       "      <th>word_count</th>\n",
       "      <th>keywords</th>\n",
       "      <th>topic</th>\n",
       "      <th>difficulty</th>\n",
       "      <th>post_expression</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>946</td>\n",
       "      <td>甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．</td>\n",
       "      <td>x=20/(4-1.5)*1.5</td>\n",
       "      <td>甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...</td>\n",
       "      <td>[甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]</td>\n",
       "      <td>[增加, 是]</td>\n",
       "      <td>24</td>\n",
       "      <td>[甲数, 商是, 乙数]</td>\n",
       "      <td>3</td>\n",
       "      <td>23</td>\n",
       "      <td>nnn+*n*</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21227</td>\n",
       "      <td>客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...</td>\n",
       "      <td>x=196/(80%+((3)/(3+2))-1)</td>\n",
       "      <td>客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...</td>\n",
       "      <td>[客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...</td>\n",
       "      <td>[相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]</td>\n",
       "      <td>63</td>\n",
       "      <td>[路程, 前进, 小时]</td>\n",
       "      <td>1</td>\n",
       "      <td>58</td>\n",
       "      <td>nnnnn+*+n+*</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16892</td>\n",
       "      <td>图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？</td>\n",
       "      <td>x=30*(1-(1/5))+5</td>\n",
       "      <td>图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...</td>\n",
       "      <td>[图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]</td>\n",
       "      <td>[借出, 回, 有]</td>\n",
       "      <td>28</td>\n",
       "      <td>[角有书, 图书]</td>\n",
       "      <td>3</td>\n",
       "      <td>33</td>\n",
       "      <td>nnnn*+*n+</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8502</td>\n",
       "      <td>甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...</td>\n",
       "      <td>x=(230-35)/3-48</td>\n",
       "      <td>甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...</td>\n",
       "      <td>[甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...</td>\n",
       "      <td>[相距, 相向, 相距, 已知]</td>\n",
       "      <td>38</td>\n",
       "      <td>[小时, 相距, 已知]</td>\n",
       "      <td>1</td>\n",
       "      <td>21</td>\n",
       "      <td>nn+n*n+</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23021</td>\n",
       "      <td>果园里有苹果树300棵，比桔树多20%，桔树有多少棵？</td>\n",
       "      <td>x=300/(1+20%)</td>\n",
       "      <td>果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？</td>\n",
       "      <td>[果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]</td>\n",
       "      <td>[有, 有]</td>\n",
       "      <td>17</td>\n",
       "      <td>[苹果树, 果园]</td>\n",
       "      <td>1</td>\n",
       "      <td>22</td>\n",
       "      <td>nnn+*</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      id                                      original_text  \\\n",
       "0    946              甲数除以乙数的商是1.5，如果甲数增加20，则甲数是乙的4倍．原来甲数=．   \n",
       "1  21227  客车和货车分别从A、B两站同时相向开出，5小时后相遇．相遇后，两车仍按原速度前进，当它们相距...   \n",
       "2  16892          图书角有书30本，第一天借出了(1/5)，第二天又还回5本，现在图书角有多少本书？   \n",
       "3   8502  甲、乙两车同时从相距230千米的两地相向而行，3小时后两车还相距35千米．已知甲车每小时行4...   \n",
       "4  23021                        果园里有苹果树300棵，比桔树多20%，桔树有多少棵？   \n",
       "\n",
       "                    equation  \\\n",
       "0           x=20/(4-1.5)*1.5   \n",
       "1  x=196/(80%+((3)/(3+2))-1)   \n",
       "2           x=30*(1-(1/5))+5   \n",
       "3            x=(230-35)/3-48   \n",
       "4              x=300/(1+20%)   \n",
       "\n",
       "                                      segmented_text  \\\n",
       "0  甲 数 除以 乙 数 的 商 是 1.5 ， 如果 甲 数 增加 20 ， 则 甲 数 是 ...   \n",
       "1  客车 和 货车 分别 从 A 、 B 两站 同时 相向 开出 ， 5 小时 后 相遇 ． 相...   \n",
       "2  图书 角 有 书 30 本 ， 第一天 借出 了 (1/5) ， 第 二 天 又 还 回 5...   \n",
       "3  甲 、 乙 两车 同时 从 相距 230 千米 的 两地 相向 而 行 ， 3 小时 后 两...   \n",
       "4        果园 里 有 苹果树 300 棵 ， 比 桔树 多 20% ， 桔树 有 多少 棵 ？   \n",
       "\n",
       "                                             content  \\\n",
       "0    [甲数, 除以, 乙数, 商是, 1.5, 甲数, 增加, 20, 甲数, 乙, 倍, 甲数]   \n",
       "1  [客车, 货车, A, B, 两站, 相向, 开出, 小时, 相遇, 相遇, 两车, 按原,...   \n",
       "2       [图书, 角有, 书, 30, 第一天, 借出, 第二天, 回, 图书, 角有, 本书]   \n",
       "3  [甲, 乙, 两车, 相距, 230, 千米, 两地, 相向, 而行, 小时, 两车, 相距...   \n",
       "4               [果园, 里, 苹果树, 300, 棵, 桔树, 20%, 桔树, 棵]   \n",
       "\n",
       "                                    verbs  word_count      keywords  topic  \\\n",
       "0                                 [增加, 是]          24  [甲数, 商是, 乙数]      3   \n",
       "1  [相向, 开出, 相遇, 相遇, 前进, 相距, 行, 已行, 未行, 求]          63  [路程, 前进, 小时]      1   \n",
       "2                              [借出, 回, 有]          28     [角有书, 图书]      3   \n",
       "3                        [相距, 相向, 相距, 已知]          38  [小时, 相距, 已知]      1   \n",
       "4                                  [有, 有]          17     [苹果树, 果园]      1   \n",
       "\n",
       "   difficulty post_expression  \n",
       "0          23         nnn+*n*  \n",
       "1          58     nnnnn+*+n+*  \n",
       "2          33       nnnn*+*n+  \n",
       "3          21         nn+n*n+  \n",
       "4          22           nnn+*  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def InfixToPostfix(infixexpr):\n",
    "    prec = {}\n",
    "    prec['^'] = 4\n",
    "    prec[\"*\"] = 3\n",
    "    prec[\"/\"] = 3\n",
    "    prec[\"+\"] = 2\n",
    "    prec[\"-\"] = 2\n",
    "    prec[\"(\"] = 1\n",
    "\n",
    "    opstack = Stack()\n",
    "    postfixList = []\n",
    "\n",
    "    for token in infixexpr:\n",
    "        if token == 'n':\n",
    "            postfixList.append(token)\n",
    "        elif token == \"(\":\n",
    "            opstack.push(token)\n",
    "        elif token == \")\":\n",
    "            topstack = opstack.pop()\n",
    "            while topstack != \"(\":\n",
    "                postfixList.append(topstack)\n",
    "                if opstack.isEmpty():\n",
    "                    print(infixexpr)\n",
    "                else :\n",
    "                    topstack = opstack.pop()\n",
    "        else:\n",
    "            while (not opstack.isEmpty()) and (prec[opstack.peek()] >= prec[token]):\n",
    "                postfixList.append(opstack.pop())\n",
    "            opstack.push(token)\n",
    "    while not opstack.isEmpty():\n",
    "        postfixList.append(opstack.pop())\n",
    "    return ''.join(postfixList)\n",
    "\n",
    "\n",
    "data['post_expression'] = data['post_expression'].apply(InfixToPostfix)\n",
    "\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Templates of postfix expressions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Template may be useful to solve math word problems. In fact,many algorithms are based on them.The result shows that 15 kinds of postfix templates can help us solve about 70% of problems."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": "<svg class=\"main-svg\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"700\" height=\"450\" style=\"\" viewBox=\"0 0 700 450\"><rect x=\"0\" y=\"0\" width=\"700\" height=\"450\" style=\"fill: rgb(255, 255, 255); fill-opacity: 1;\"/><defs id=\"defs-2153d9\"><g class=\"clips\"/><g class=\"gradients\"/></defs><g class=\"bglayer\"/><g class=\"layer-below\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"cartesianlayer\"/><g class=\"polarlayer\"/><g class=\"ternarylayer\"/><g class=\"geolayer\"/><g class=\"funnelarealayer\"/><g class=\"pielayer\"><g class=\"trace\" stroke-linejoin=\"round\" style=\"opacity: 1;\"><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l0,-135a135,135 0 0 1 123.4568950160517,189.62046386653623Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(395.0518863159567,199.72206128001272)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(255, 255, 255); fill-opacity: 1; white-space: pre;\">31.6%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l-85.40667182777483,-104.54998999188271a135,135 0 0 1 85.40667182777483,-30.45001000811729Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(299.5612113523557,144.60482141811724)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">10.9%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l-125.61691990225351,-49.45087900402605a135,135 0 0 1 40.210248074478685,-55.09911098785666Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(246.44414505614014,176.2652233840615)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">8.13%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l-133.9250863193623,17.002095587055162a135,135 0 0 1 8.30816641710878,-66.45297459108122Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(226.16584259134174,226.37761051291466)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">7.98%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l-111.19875072569894,76.5495776411854a135,135 0 0 1 -22.72633559366335,-59.54748205413023Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(231.46104748020585,278.7401081745399)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">7.59%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l-75.11042370903088,112.1758630466013a135,135 0 0 1 -36.08832701666806,-35.6262854054159Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(25, 211, 243); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(253.6540986614234,320.67830750570596)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">6.01%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l-34.72533317109597,130.45746907002444a135,135 0 0 1 -40.385090537934914,-18.28160602342315Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 102, 146); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(295.7559636827254,330.0104206265738)rotate(-65.64459027717817)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">5.25%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l3.9545694332464376,134.94206675680357a135,135 0 0 1 -38.67990260434241,-4.484597686779125Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(182, 232, 128); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(326.2420721254286,339.2504898863246)rotate(-83.3865814696486)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">4.61%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l30.60365326363546,131.48542279248744a135,135 0 0 1 -26.649083830389024,3.456643964316129Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 151, 255); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(342.969175538198,345.2942793622084)rotate(82.60944650721012)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">3.17%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l51.60727815470282,124.74649831343217a135,135 0 0 1 -21.003624891067357,6.738924479055271Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(254, 203, 82); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(364.5367542463433,347.43558259301744)rotate(72.21138070978327)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(68, 68, 68); fill-opacity: 1; white-space: pre;\">2.6%</text></g></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l70.73470085402647,114.985225551335a135,135 0 0 1 -19.127422699323652,9.761272762097164Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(198, 202, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(418.5606982514731,372.9325533056928)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2.53%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M394.8653857514731,355.2467855353019V368.1356783056928h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l87.98905424469704,102.38616280105315a135,135 0 0 1 -17.254353390670573,12.59906275028186Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(247, 167, 153); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(436.80681734513075,358.5263033056928)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2.52%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M413.11150484513075,344.0275575086121V353.7294283056928h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l102.85017176501346,87.44622443483327a135,135 0 0 1 -14.861117520316412,14.939938366219877Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(51, 255, 201); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(452.90687389982753,344.1200533056928)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">2.49%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M429.21156139982753,330.20660173752157V339.3231783056928h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l112.10449894478968,75.21689515220399a135,135 0 0 1 -9.254327179776226,12.22932928262928Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(224, 198, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(464.84644690491797,329.7138033056928)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1.81%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M441.15113440491797,316.463079129961V324.9169283056928h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l118.50301403082538,64.66866061400995a135,135 0 0 1 -6.398515086035701,10.548234538194038Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(255, 219, 192); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(472.6196278463247,315.3075533056928)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1.45%</text></g><path class=\"textline\" stroke-width=\"1.5\" d=\"M448.9243153463247,305.01590838396794V310.5106783056928h3.6015625\" fill=\"none\" style=\"stroke: rgb(42, 63, 95); stroke-opacity: 1;\"/></g><g class=\"slice\"><path class=\"surface\" d=\"M333.5,235l123.45689501605169,54.62046386653625a135,135 0 0 1 -4.953880985226306,10.048196747473703Z\" style=\"pointer-events: all; stroke-width: 0; fill: rgb(122, 230, 248); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/><g class=\"slicetext\"><text data-notex=\"1\" class=\"slicetext\" transform=\"translate(478.27954335175974,300.9013033056928)\" text-anchor=\"middle\" x=\"0\" y=\"0\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">1.32%</text></g></g></g></g><g class=\"treemaplayer\"/><g class=\"sunburstlayer\"/><g class=\"glimages\"/><defs id=\"topdefs-2153d9\"><g class=\"clips\"/><clipPath id=\"legend2153d9\"><rect width=\"114\" height=\"270\" x=\"0\" y=\"0\"/></clipPath></defs><g class=\"layer-above\"><g class=\"imagelayer\"/><g class=\"shapelayer\"/></g><g class=\"infolayer\"><g class=\"legend\" pointer-events=\"all\" transform=\"translate(574.26,100)\"><rect class=\"bg\" shape-rendering=\"crispEdges\" style=\"stroke: rgb(68, 68, 68); stroke-opacity: 1; fill: rgb(255, 255, 255); fill-opacity: 1; stroke-width: 0px;\" width=\"114\" height=\"270\" x=\"0\" y=\"0\"/><g class=\"scrollbox\" transform=\"\" clip-path=\"url('#legend2153d9')\"><g class=\"groups\"><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,14.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">others</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(99, 110, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,33.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn*</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(239, 85, 59); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,52.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn+n*</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(0, 204, 150); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,71.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nnn+*</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(171, 99, 250); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,90.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn*n*</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 161, 90); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,109.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn*n+</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(25, 211, 243); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,128.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nnn**</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 102, 146); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,147.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nnnn*+*</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(182, 232, 128); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,166.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn+n+</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 151, 255); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,185.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn+</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(254, 203, 82); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,204.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nnn**nn**</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(198, 202, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,223.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nnn*+</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(247, 167, 153); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,242.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn*nn*+</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(51, 255, 201); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,261.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn*nn+*</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(224, 198, 253); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,280.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nnn*+n*</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(255, 219, 192); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g><g class=\"traces\" style=\"opacity: 1;\" transform=\"translate(0,299.5)\"><text class=\"legendtext\" text-anchor=\"start\" x=\"40\" y=\"4.680000000000001\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 12px; fill: rgb(42, 63, 95); fill-opacity: 1; white-space: pre;\">nn*nn**</text><g class=\"layers\" style=\"opacity: 1;\"><g class=\"legendfill\"/><g class=\"legendlines\"/><g class=\"legendsymbols\"><g class=\"legendpoints\"><path class=\"legendpie\" d=\"M6,6H-6V-6H6Z\" transform=\"translate(20,0)\" style=\"stroke-width: 0; fill: rgb(122, 230, 248); fill-opacity: 1; stroke: rgb(68, 68, 68); stroke-opacity: 1;\"/></g></g></g><rect class=\"legendtoggle\" x=\"0\" y=\"-9.5\" width=\"108.484375\" height=\"19\" style=\"fill: rgb(0, 0, 0); fill-opacity: 0;\"/></g></g></g><rect class=\"scrollbar\" rx=\"20\" ry=\"3\" width=\"0\" height=\"0\" style=\"fill: rgb(128, 139, 164); fill-opacity: 1;\" x=\"0\" y=\"0\"/></g><g class=\"g-gtitle\"><text class=\"gtitle\" x=\"35\" y=\"50\" text-anchor=\"start\" dy=\"0em\" style=\"font-family: 'Open Sans', verdana, arial, sans-serif; font-size: 17px; fill: rgb(42, 63, 95); opacity: 1; font-weight: normal; white-space: pre;\">Templates of postfix expressions</text></g></g></svg>"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "ds = data['post_expression'].value_counts().reset_index()\n",
    "ds = ds.sort_values(['post_expression'])\n",
    "\n",
    "output = ds.tail(15)\n",
    "other_sum = ds['post_expression'].sum() - output['post_expression'].sum()\n",
    "\n",
    "output.columns = [\n",
    "    'post_expression',\n",
    "    'percent'\n",
    "]\n",
    "\n",
    "output = output.sort_values(['percent'])\n",
    "output.loc[15] = ['others', other_sum]\n",
    "\n",
    "fig = px.pie(\n",
    "    output,\n",
    "    names = 'post_expression',\n",
    "    values = 'percent',\n",
    "    title = 'Templates of postfix expressions',\n",
    ")\n",
    "\n",
    "fig.show(\"svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reference\n",
    "@inproceedings{Liu2019TreestructuredDF,\n",
    "  title={Tree-structured Decoding for Solving Math Word Problems},\n",
    "  author={Qianying Liu and Wenyv Guan and Sujian Li and Daisuke Kawahara},\n",
    "  booktitle={EMNLP/IJCNLP},\n",
    "  year={2019}\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "@inproceedings{Xie2019AGT,\n",
    "  title={A Goal-Driven Tree-Structured Neural Model for Math Word Problems},\n",
    "  author={Zhipeng Xie and Shichao Sun},\n",
    "  booktitle={IJCAI},\n",
    "  year={2019}\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "@inproceedings{Wang2019TemplateBasedMW,\n",
    "  title={Template-Based Math Word Problem Solvers with Recursive Neural Networks},\n",
    "  author={Lei Wang and D. Zhang and Jipeng Zhang and Xing Xu and L. Gao and B. Dai and H. Shen},\n",
    "  booktitle={AAAI},\n",
    "  year={2019}\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "@article{Lee2020SolvingAW,\n",
    "  title={Solving Arithmetic Word Problems with a Templatebased Multi-Task Deep Neural Network (T-MTDNN)},\n",
    "  author={D. Lee and G. Gweon},\n",
    "  journal={2020 IEEE International Conference on Big Data and Smart Computing (BigComp)},\n",
    "  year={2020},\n",
    "  pages={271-274}\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
  },
  "kernelspec": {
   "display_name": "Python 3.9.7 64-bit",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
